The University of Arizona maintains an HPC center, which houses three compute resources: El Gato, Ocelote, and Puma.
| Name | El Gato | Ocelote | Puma |
|---|---|---|---|
| Model | IBM System X iDataPlex dx360 M4 | Lenovo NeXtScale nx360 M5 | Penguin Altus XE2242 |
| Node Count | 131 | 400 | 236 CPU-only |
| 8 GPU | |||
| 2 High-memory | |||
| Total System Memory (TB) | 26TB | 82.6TB | 128TB |
| Processors | 2x Xeon E5-2650v2 8-core (Ivy Bridge) | 2x Xeon E5-2695v3 14-core (Haswell) | 2x AMD EPYC 7642 48-core (Rome) |
| 2x Xeon E5-2695v4 14-core (Broadwell) | |||
| 4x Xeon E7-4850v2 12-core (Ivy Bridge) | |||
| Cores / Node (schedulable) | 16c | 28c (48c - High-memory node) | 94c |
| Total Cores | 2160* | 11528* | 23616* |
| Processor Speed | 2.66GHz | 2.3GHz (2.4GHz - Broadwell CPUs) | 2.4GHz |
| Memory / Node | 256GB - GPU nodes | 192GB (2TB - High-memory node) | 512GB (3TB - High-memory nodes) |
| 64GB - CPU-only nodes | |||
| Accelerators | 46 NVIDIA P100 (16GB) | 29 NVIDIA V100S | |
| /tmp | ~840 GB spinning | ~840 GB spinning | ~1440 TB NVMe |
| /tmp is part of root filesystem | /tmp is part of root filesystem | /tmp | |
| HPL Rmax (TFlop/s) | 46 | 382 | |
| OS | Centos 7 | CentOS 7 | CentOS 7 |
The UArizona HPC provides three types of resources:
*Note: High priority is only available for the Puma cluster.
PhytoOracle is a scalable, modular phenomics data processing workflow manager. In short, this means that PhytoOracle can leverage high performance computer (HPC) clusters and cloud computing to distributed tasks across hundreds to thousands of cores.
Resources are defined in the workload_manager section of the PhytoOracle YAML. In this section, you can define many compute resource settings. Below is an example:
There are a few things you must ensure before deploying PhytoOracle: - Confirm existence and accuracy of GCP file - Visually inspect on QGIS. Confirm correct placement of GCPs by overlaying the points with an RGB orthomosaic, either drone or gantry. - Confirm existence and accuracy of GeoJSON file - Visually inspect on QGIS. Checking plot number sequence and genotype values.
If these steps are not followed, errors can propagate to multiple levels of data processing, requiring a reprocessing of data.
The Field Scanner collects two-dimensional (2D) and three-dimensional (3D) data types, including scannerTop3D (3D), stereoTop (RGB), ps2Top (fluorescence), and flirIrCamera (thermal) (Figure @ref(fig:dataplot)).
Data types collected by the Field Scanner. Two-dimensional (2D) data types include RGB, fluorescence, and thermal images, while three-dimensional (3D) include 3D point clouds.
The 2D data collected by the Field Scanner includes stereoTop (RGB), flirIrCamera (thermal), and ps2Top (fluorescence). These data process relatively quickly as they are much lower in size compared to 3-dimensional (3D) data. The processing of 2D data types are fully developed for both lettuce and sorghum (Figure @ref(fig:2dprocessingplot)).
Visualization of 2D data processing by PhytoOracle.
The major goal of the PhytoOracle project is to phentoype individual plants at a high spatial-temporal scale. To accomplish this, individual plant positioning information (GPS coordinates) collected during 2D data processing are leveraged to extract data from 3D data (Figure @ref(fig:3dprocessingplot)).
Visualization of 3D data processing by PhytoOracle.
As such, much focus has been placed on 3D point cloud data. These data undergo intensive processing to extract individual plant point clouds (Figure @ref(fig:pointcloudplot)).
Individual plant point clouds processed by PhytoOracle.
After (i) checking the GCP and GeoJSON files (Section @ref(prep)) and (ii) generating a YAML file (Section @ref(yaml)), you are now ready to run PhytoOracle.
PhytoOracle is made up of multiple workflows to process 2-dimensional (2D) and 3-dimensional data (Figure @ref(fig:workflowplot)). These workflows allow for automated, scalable processing of raw data collected by the Field Scanner. The data processing results in high spatial-temporal phenotype information.
PhytoOracle workflows for processing raw data collected by the Field Scanner.
PhytoOracle is mainly deployed on the UArizona HPC. The next sections provides a brief description of how to run each workflow. For additional details, please refer to the PhytoOracle publication. In all cases, the commands provided will automatically handle all steps of processing, including:
The stereoTop workflow runs image stitching and plant detection, resulting in the extraction of bounding area and GPS coordinate information for each plant. The workflow is run as follows:
sbatch shell_scripts/slurm_submission_large.sh <yaml_file>
For example, if you wanted to run this for season 15:
sbatch shell_scripts/slurm_submission_large.sh
yaml_files/season_15/stereoTop_level01_s15.yaml
The flirIrCamera workflow runs image stitching and plant detection, resulting in the extraction of canopy temperature and GPS coordinate information for each plant. The workflow is run as follows:
sbatch shell_scripts/slurm_submission_large.sh <yaml_file>
For example, if you wanted to run this for season 15:
sbatch shell_scripts/slurm_submission_large.sh
yaml_files/season_15/flirIrCamera_level01_s15.yaml
The ps2Top workflow applies a threshold to fluorescence plot-centered images, resulting in the extraction of maximum potential quantum efficiency of Photosystem II (Fv/Fm).
sbatch shell_scripts/slurm_submission_large.sh <yaml_file>
For example, if you wanted to run this for season 15:
sbatch shell_scripts/slurm_submission_large.sh
yaml_files/season_15/ps2Top_level01_s15.yaml
The scanner3DTop workflow runs point cloud stitching leverages GPS coordinates collected during stereoTop processing, resulting in the extraction of traditional and topological shape descriptors for each plant. This worlflow involves multiple levels of processing, including:
For example, if you wanted to run level 1 processing for season 15:
sbatch shell_scripts/slurm_submission.sh
yaml_files/season_15/scanner3DTop_level01_s15.yaml
To run level 2 processing for season 15:
sbatch shell_scripts/slurm_submission.sh
yaml_files/season_15/scanner3DTop_level02_s15.yaml
*Note: Notice that scanner3DTop level 1 and 2 processing uses the shell_scripts/slurm_submission.sh instead of shell_scripts/slurm_submission_large.sh. This is because the manager node performs no processing, it merely provides the tasks and sends them to worker nodes. As such, the manager node only requires two cores instead of 94.
Although PhytoOracle is reproducible due to the use of containers and YAML configuration files, it is important to follow quality control (QC) and quality assurance (QA) steps after data processing. The recommended steps for this are:
If any errors are spotted during these QA/QC steps, immediately notify the project lead. Depending on the impact of the error, data may need to be reprocessed to ensure data integrity.